AITopics

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-17-2026, 03:09:20 GMT

ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design

Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design.

benchmark, bioinformatics, machine learning, (18 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(4 more...)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Neural Information Processing SystemsFeb-16-2026, 10:56:33 GMT

89af8e4eb696738f2c9e589522968a09-Paper-Conference.pdf

dataset, information, likelihood, (13 more...)

Country:

North America > United States (0.14)
Europe > France (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > Canada (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 22:41:54 GMT

f51338d736f95dd42427296047067694-Paper.pdf

language model, protein, sequence, (17 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (0.69)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-6-2025

NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

Li, Zhongmin, Ma, Runze, Tan, Jiahao, Tan, Chengzi, Zheng, Shuangjia

Nucleotide sequence variation can induce significant shifts in functional fitness. Recent nucleotide foundation models promise to predict such fitness effects directly from sequence, yet heterogeneous datasets and inconsistent preprocessing make it difficult to compare methods fairly across DNA and RNA families. Here we introduce NABench, a large-scale, systematic benchmark for nucleic acid fitness prediction. NABench aggregates 162 high-throughput assays and curates 2.6 million mutated sequences spanning diverse DNA and RNA families, with standardized splits and rich metadata. We show that NABench surpasses prior nucleotide fitness benchmarks in scale, diversity, and data quality. Under a unified evaluation suite, we rigorously assess 29 representative foundation models across zero-shot, few-shot prediction, transfer learning, and supervised settings. The results quantify performance heterogeneity across tasks and nucleic-acid types, demonstrating clear strengths and failure modes for different modeling choices and establishing strong, reproducible baselines. We release NABench to advance nucleic acid modeling, supporting downstream applications in RNA/DNA design, synthetic biology, and biochemistry. Our code is available at https://github.com/mrzzmrzz/NABench.

bioinformatics, large language model, machine learning, (19 more...)

2511.02888

Country: Asia (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Biomedical Informatics (0.93)

arXiv.org Artificial IntelligenceOct-30-2025

Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA

Huang, Mingyu, Zhou, Shasha, Li, Ke

Machine learning models increasingly map biological sequence-fitness landscapes to predict mutational effects. Effective evaluation of these models requires benchmarks curated from empirical data. Despite their impressive scales, existing benchmarks lack topographical information regarding the underlying fitness landscapes, which hampers interpretation and comparison of model performance beyond averaged scores. Here, we introduce GraphFLA, a Python framework that constructs and analyzes fitness landscapes from mutagensis data in diverse modalities (e.g., DNA, RNA, protein, and beyond) with up to millions of mutants. GraphFLA calculates 20 biologically relevant features that characterize 4 fundamental aspects of landscape topography. By applying GraphFLA to over 5,300 landscapes from ProteinGym, RNAGym, and CIS-BP, we demonstrate its utility in interpreting and comparing the performance of dozens of fitness prediction models, highlighting factors influencing model accuracy and respective advantages of different models. In addition, we release 155 combinatorially complete empirical fitness landscapes, encompassing over 2.2 million sequences across various modalities. All the codes and datasets are available at https://github.com/COLA-Laboratory/GraphFLA.

data mining, evolutionary algorithm, machine learning, (21 more...)

2510.24826

Country: Europe (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(3 more...)

Kmicikiewicz, Michal, Fortuin, Vincent, Szczurek, Ewa

ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods

arXiv.org Artificial IntelligenceOct-28-2025

Designing protein sequences of both high fitness and novelty is a challenging task in data-efficient protein engineering. Exploration beyond wild-type neighborhoods often leads to biologically implausible sequences or relies on surrogate models that lose fidelity in novel regions. Here, we propose ProSpero, an active learning framework in which a frozen pre-trained generative model is guided by a surrogate updated from oracle feedback. By integrating fitness-relevant residue selection with biologically-constrained Sequential Monte Carlo sampling, our approach enables exploration beyond wild-type neighborhoods while preserving biological plausibility. We show that our framework remains effective even when the surrogate is misspecified. ProSpero consistently outperforms or matches existing methods across diverse protein engineering tasks, retrieving sequences of both high fitness and novelty.

evolutionary algorithm, machine learning, natural language, (18 more...)

2505.22494

Country: North America > United States (0.92)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.46)

Neural Information Processing SystemsOct-10-2025, 08:48:33 GMT

DePLM: Denoising Protein Language Models for Property Optimization Zeyuan Wang 1,2 Keyan Ding 2 Ming Qin 1,2 Xiaotong Li1,2

Protein optimization is a fundamental biological task aimed at enhancing the performance of proteins by modifying their sequences. Computational methods primarily rely on evolutionary information (EI) encoded by protein language models (PLMs) to predict fitness landscape for optimization. However, these methods suffer from a few limitations.

dataset, information, likelihood, (13 more...)

Country:

North America > United States (0.14)
Europe > France (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > Canada (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 07:33:46 GMT

cac723e5ff29f65e3fcbb0739ae91bee-Paper-Datasets_and_Benchmarks.pdf

benchmark, bioinformatics, machine learning, (18 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(4 more...)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Pinheiro, Rodrigo Lankaites, Landa-Silva, Dario, Atkin, Jason

A Technique Based on Trade-off Maps to Visualise and Analyse Relationships Between Objectives in Optimisation Problems

arXiv.org Artificial IntelligenceOct-2-2025

Understanding the relationships between objectives in a multiobjective optimisation problem is important for developing tailored and efficient solving techniques. In particular, when tackling combinatorial optimisation problems with many objectives, that arise in real-world logistic scenarios, better support for the decision maker can be achieved through better understanding of the often complex fitness landscape. This paper makes a contribution in this direction by presenting a technique that allows a visualisation and analysis of the local and global relationships between objectives in optimisation problems with many objectives. The proposed technique uses four steps: First, the global pairwise relationships are analysed using the Kendall correlation method; then, the ranges of the values found on the given Pareto front are estimated and assessed; next, these ranges are used to plot a map using Gray code, similar to Karnaugh maps, that has the ability to highlight the trade-offs between multiple objectives; and finally, local relationships are identified using scatter plots. Experiments are presented for three combinatorial optimisation problems: multiobjective multidimensional knapsack problem, multiobjective nurse scheduling problem, and multiobjective vehicle routing problem with time windows . Results show that the proposed technique helps in the gaining of insights into the problem difficulty arising from the relationships between objectives.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)